Document Clustering with User feedback

نویسندگان

  • Phi The Pham
  • Koen Deschacht
  • Marie-Francine Moens
چکیده

In this paper, we focus on the problem of incorporating user input into an automated document clustering process to improve clustering performance. Before the start of the clustering process, the user can provide a small set of labeled documents that form the initial descriptions of the clusters. If the user provides no initial information, the clustering process has to form the initial descriptions of the clusters by itself. At the end of the clustering process, the user can navigate the clusters, assess the clustering quality and if necessary, provide feedback to the clustering process. With this user feedback, the clustering process will retrain itself to obtain new clusters that best describe the nature of the data and the desire of user. The results show that our methods for initializing the clustering process are valuable and that user feedback improves the quality of the clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RRLUFF: Ranking function based on Reinforcement Learning using User Feedback and Web Document Features

Principal aim of a search engine is to provide the sorted results according to user’s requirements. To achieve this aim, it employs ranking methods to rank the web documents based on their significance and relevance to user query. The novelty of this paper is to provide user feedback-based ranking algorithm using reinforcement learning. The proposed algorithm is called RRLUFF, in which the rank...

متن کامل

Web pages ranking algorithm based on reinforcement learning and user feedback

The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...

متن کامل

Document Similarity Judgment for Interactive Document Clustering

This paper investigates the task of document similarity judgment for interactive document clustering. We suppose one of the promising approaches for developing next generation of web search engines is to incorporate user feedback mechanism into constrained clustering. As a basis for designing such search engines, it is important to study the interface design that can reduce user' burden of givi...

متن کامل

Deciphering cluster representations

There are several recent studies that propose search output clustering as an alternative representation method to ranked output. Users are provided with cluster representations instead of lists of titles and invited to make decisions on groups of documents. This paper discusses the diculties involved in representing clusters for usersÕ evaluation in a concise but easily interpretable form. The...

متن کامل

UCSC at Relevance Feedback Track

The relevance feedback track in TREC 2009 focuses on two sub tasks: actively selecting good documents for users to provide relevance feedback and retrieving documents based on user relevance feedback. For the first task, we tried a clustering based method and the Transductive Experimental Design (TED) method proposed by Yu et al. [5]. For clustering based method, we use the K-means algorithm to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008